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A METHOD AND SYSTEM FOR IMPLEMENTING A FLOATING POINT COMPARE 

USING RECORDED FLAGS 



TECHNICAL FIELD 

The present invention relates generally to digital microprocessors. More 
specifically, the present invention pertains to efficiently implementing hardware 
10 support for the floating point math operations in a digital microprocessor. 

BACKGROUND ART 

Floating point math is used in computer system applications for calculating a 
large range of numbers quickly. Generally, floating point math refers to a method for 

15 storing and calculating numbers in which the decimal points do not line up as in fixed 
point numbers. The significant digits are stored as a unit called the "mantissa," and 
the location of the radix point (e.g., decimal point in base-10) is stored in a separate 
unit called the "exponent." Floating point operations can be implemented in 
hardware (e.g., a floating point unit of a microprocessor), or they can be done in 

20 software. In large systems, they can also be performed in a separate floating point 
processor that is connected to the main processor via a bus. IEEE Standard 754 
floating point (also known as IEC 559, IEEE 854 or IEC 60559) is the most common 
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representation for real numbers on computers, including x86 based PC's, 
Macintoshes, and most Unix platforms. With respect to x86 and/or x87 based 
computer systems, the implementation of IEEE 754 floating point math is inefficient. 
This is problematic since many different types of computer system applications 
5 require high-performance floating point calculations. What is required is an efficient 
implementation of IEEE 754 floating point math on x86 based computer systems. 
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DISCLOSURE OF THE INVENTION 

Embodiments of the present invention provide a method and system for 
implementing a floating point compare instruction using recorded flags. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

The accompanying drawings, which are incorporated in and form a part of this 
specification, illustrate embodiments of the invention and, together with the 
description, serve to explain the principles of the invention: 

5 

Figure 1 shows a flow diagram depicting the operation of two floating point 
compare instructions in accordance with one embodiment of the present invention. 

Figure 2 shows a diagram of the encoding logic required to encode the flags 
10 SF and OF to reflect the results of a floating point compare instruction in accordance 
with one embodiment of the present invention. 

Figure 3 shows a diagram of an exemplary processor architecture in 
accordance with one embodiment of the present invention. 

15 

Figure 4 shows a diagram of a computer system in accordance with one 
embodiment of the present invention. 
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DETAILED DESCRIPTION OF THE EMBODIMENTS 

Reference will now be made in detail to the preferred embodiments of the 
present invention, examples of which are illustrated in the accompanying drawings. 
While the invention will be described in conjunction with the preferred embodiments, 

5 it will be understood that they are not intended to limit the invention to these 
embodiments. On the contrary, the invention is intended to cover alternatives, 
modifications and equivalents, which may be included within the spirit and scope of 
the invention as defined by the appended claims. Furthermore, in the following 
detailed description of embodiments of the present invention, numerous specific 

10 details are set forth in order to provide a thorough understanding of the present 
invention. However, it will be recognized by one of ordinary skill in the art that the 
present invention may be practiced without these specific details. In other instances, 
well-known methods, procedures, components, and circuits have not been described 
in detail as not to unnecessarily obscure aspects of the embodiments of the present 

15 invention. 

Embodiments of the present invention implement a method and system for 
providing hardware support for floating point compare operations in an x86 
compatible processor. The method includes the step of comparing a first bit pattern 
20 and a second bit pattern (where each pattern represents a floating point number or 
other entity) using a floating point unit of an x86 compatible processor. The EFLAGS 
register is set in accordance with a result of the comparison (e.g., as result of the 
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FCOMI instruction). A sign flag (SF) and an overflow flag (OF) of the EFLAGS 
register are encoded with information derived from the result of the comparison. 
Embodiments of the present invention and their benefits are further described below. 

5 Notation and Nomenclature 

Some portions of the detailed descriptions which follow are presented in terms 
of procedures, steps, logic blocks, processing, and other symbolic representations of 
operations on data bits within a computer memory. These descriptions and 
representations are the means used by those skilled in the data processing arts to 

10 most effectively convey the substance Df their work to others skilled in the art. A 
procedure, computer executed step, logic block, process, etc., is here, and generally, 
conceived to be a self-consistent sequence of steps or instructions leading to a 
desired result. The steps are those requiring physical manipulations of physical 
quantities. Usually, though not necessarily, these quantities take the form of 

15 electrical or magnetic signals capable of being stored, transferred, combined, 
compared, and otherwise manipulated in a computer system. It has proven 
convenient at times, principally for reasons of common usage, to refer to these 
signals as bits, values, elements, symbols, characters, terms, numbers, or the like. 

20 It should be borne in mind, however, that all of these and similar terms are to 

be associated with the appropriate physical quantities and are merely convenient 
labels applied to these quantities. Unless specifically stated otherwise as apparent 
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from the following discussions, it is appreciated that throughout the present invention, 
discussions utilizing terms such as "storing" or "accessing" or "recognizing" or 
"retrieving" or "translating" or the like, refer to the action and processes of a computer 
system, or similar electronic computing device, that manipulates and transforms data 
5 represented as physical (electronic) quantities within the computer system's registers 
and memories into other data similarly represented as physical quantities within the 
computer system memories or registers or other such information storage, 
transmission or display devices. 

10 - Embodiments of the present invention 

Figure 1 shows a flow diagram depicting the operation of two floating point 

compare instructions in accordance with one embodiment of the present invention. 

As depicted in Figure 1 , block 110 shows the operation of a floating point compare 

instruction as implemented by a floating point unit of an x86 compatible 
15 microprocessor. Block 1 20 depicts the operation of a floating point compare 

instruction as implemented by an integer unit of the x86 compatible microprocessor. 

As shown in block 1 10, a floating point compare (FCOM) of two input bit 
patterns 1 12-113 is executed by a compare unit -114. The result of the comparison is 
20 stored into a floating point status register 115. As known by those skilled in the art, in 
an x86 compatible microprocessor, the floating point status register 115 includes 
three flags 116-118, which are referred to as CO, C2, and C3, as shown. As known 
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by those skilled in the art, the three flags 116-118 are used to encode the result of 
the comparison performed by a floating point compare unit 114. 

Generally speaking, in an x86/x87 compatible processor, the three flags 1 16- 
5 1 18 are used to encode four possible conditions: less than ("<"), equal ("="), greater 
than (">"), and unordered ("U") that are the result of the FCOM instruction. In 
accordance with IEEE 754, "U" results where at least one of the operands of the 
compare (e.g., bit pattern 112 or bit pattern 113) is a "NaN" (Not-a-Number) as 
defined by the specification (an invalid number, usually representing an error 
10 condition). When a floating point compare instruction like FCOM is executed, 

including the variant instructions FCOMP, FCOMP, FCOMPP, FUCOM, FUCOMP or 
FUCOMPP ("FCOM instructions"), the flags 116-118 are set to indicate the result. 

In more recent processors, instructions such as FCOMI, FCOMIP, FUCOMI, 
FUCOMIP, COMISD, COMISS, UCOMISD and UCOMISS ("FCOMI instructions") 
can be used to execute a floating point compare instruction and to place the result 
into the integer unit of the processor. As shown in block 120, when an FCOMI 
instruction is performed, the result of the floating point compare performed by the unit 
1 14 is used to set flags in the integer unit of the processor. This is depicted in Figure 
1 as the results of the floating point compare unit 1 14 being transferred to the floating 
point status register 115 and to the EFLAGS register 125. The status that 
corresponds to the flags CO, C2, and C3 map directly to the flags ZF, PF, and CF of 
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the EFLAGS register 125. With the standard FCOMI instructions, the flags SF and 
OF are not used and set to either zero or an undefined value. 

Referring still to Figure 1 , embodiments of the present invention provide a< 
5 solution to the problematic execution of floating point compare instructions in x86 
compatible processors. The basic problem involves the fact that the IEEE 754 
floating point comparison instructions are implemented poorly on the Intel 
architecture. The standard Intel architecture is a holdover from the Intel x87 line of 
math coprocessors, later incorporated into the main processor as floating point units. 

10 

As known by those skilled in the art, the standard Intel architecture codes the 
outputs of a floating point instruction, specifically the instruction "floating point 
compare" (FCOM) and its related instructions using three bits 116-118. The bits are 
referred to as flags "CO", "C2", and "C3" as described above. The two input bit 
15 patterns 112-113 (which may or may not be valid numbers) are compared using 
floating point compare instruction that executes in accordance with the IEEE 754 
standard. The result of the compare is encoded using the flags 116-118. These 
flags are made available to the rest the processor. 

20 Block 120 shows an FCOMI instruction which places the results of the 

comparison (e.g., performed by the unit 114) into the EFLAGS register 125. The 
FCOMI instructions can be very useful in that the result of the instruction is made 
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available to the hardware of the integer unit through the EFLAGS register 125. The 
integer unit hardware includes processor hardware that performs branch instructions, 
conditional move instructions, value-setting instructions (e.g. the x86 SETcc series of 
instructions), or other predicated instructions. These types of instructions are used 

5 by software code to act upon a condition (an essential requirement for any nontrivial 
piece of software code). Branch instructions directly affect the control flow of a 
program (e.g., if one floating point number is greater than or equal to another floating 
point number, continue what you are doing, else jump to this address); conditional 
move, SETcc, or other predicated instructions are used to avoid these relatively 

10 expensive control flow changes. Since these types of instructions tend to occur 
frequently, it is very important that the integer unit hardware that implements these 
instructions be highly optimized for rapid execution. 

Unfortunately, as known by those skilled in the art, instructions which involve 
15 the parity flag (PF) in their expression are particularly problematic. There is no 
efficient branch programming methodology that can make use of the parity flag PF 
127 in combination with any of the other flags (e.g., some combination of flag 127 
with flag 126 or 128). In the prior art, any branch expression, conditional move, set 
operation or other predicated operation which includes the flag PF in conjunction with 
20 any other flag requires a larger amount of code to utilize in comparison to what would 
otherwise be desirable. 
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Embodiments of the present invention solve this problem by taking the results 
(e.g., of FCOMI) that are combinations that include PF and encoding them onto the 
other status bits 129 and 130 (e.g., SF and OF) which are currently unused in the 
FCOMI instruction. This allows efficient branches, conditional moves, set operations, 
5 and other predicated operations since result combinations that include PF in their 
expression are now reflected in the new SF and OF (e.g., SF' and OF'). The process 
of encoding SF and OF in accordance with the present invention is diagrammed in 
Figure 2 below. 

10 Figure-2 shows a diagram of the encoding logic 200 required to encode the 

flags SF and OF to reflect the results of a floating point compare instruction in 
accordance with one embodiment of the present invention. As depicted in Figure 2, 
the encoding logic 200 comprises combinational logic (e.g., logic gates) that derive 
the flags SF' and OF* as shown. 

15 

Thus, the new flag SF' and OF' are computed as: 

SF' := { -PF) & CF 

OF' := (~PF) & (CF | ZF) 

... where & represents boolean AND, | represents boolean inclusive OR, and ~ 
20 represents boolean NOT. Now all fourteen nontrivial combinations of IEEE 

comparison operations, including the six normal comparison operations defined by 
among others the "C" computer language (ISO/IEC 9899), map directly onto standard 
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x86 conditions that can be efficiently used by the integer hardware, while still being 
fully compliant with the IEEE 754 specification. 

In one embodiment, the new flags SF' and OF' are recorded directly into the 
5 flags 1 29 and 1 30 (e.g., the SF flag and the OF flag) of the EFLAGS register 1 25. 
This makes the values immediately available to the hardware of the integer unit (e.g., 
the branch hardware, etc.). 

Figure 3 shows a diagram of an exemplary processor architecture 300 in 
10 accordance with one embodiment of the present invention. As depicted in Figure 3, 
a floating point unit (FPU) 301 is coupled to provide the results of a floating point 
compare instruction (e.g., FCOMI) to the EFLAGS register 125. In the present 
embodiment, the FPU 301 includes logic (e.g., logic 200 shown in Figure 2) for 
computing SF' and OF as described above. These results are provided to a register 
15 302. A multiplexer 303 can selectively couple the standard output of the EFLAGS 
register 125 to the branch unit 305, or the flags of the register 125 plus the new 
values SF' and OF' (replacing the old values SF and OF) to the branch unit 305. The 
selection is made in accordance with a control input 304, This allows the new 
capability provided by the inclusion of the new values SF' and OF' to be hidden away 
20 from other circuitry of the processor, for example, on those occasions where strict 
legacy x86 compatibility must be assured. 
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Computer System Platform 

With reference now to Figure 4, a computer system 400 in accordance with 
one embodiment of the present invention is shown. Computer system 400 shows the 
general components of a computer system in accordance with one embodiment of 

5 the present invention that provides the execution platform for implementing certain 
software-based functionality of the present invention. As described above, certain 
processes and steps of the present invention are realized, in one embodiment, as a 
series of instructions (e.g., software program) that reside within computer readable 
memory units of a computer system (e.g., system 400) and are executed by the CPU 

10 401 of system 400. When executed, the instructions cause the system 400 to 
implement the functionality of the present invention as described above. 

In general, system 400 comprises at least one CPU 401 coupled to a North 
bridge 402 and a South bridge 403. The North bridge 402 provides access to system 
15 memory 41 5 and a graphics unit 41 0 that drives a display 41 1 . The South bridge 403 
provides access to a plurality of coupled peripheral devices 431 through 433 as 
shown. Computer system 400 also shows a BIOS ROM 440 that stores BIOS 
initialization software. 

20 The foregoing descriptions of specific embodiments of the present invention 

have been presented for purposes of illustration and description. They are not 
intended to be exhaustive or to limit the invention to the precise forms disclosed, and 
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obviously many modifications and variations are possible in light of the above 
teaching. The embodiments were chosen and described in order to best explain the 
principles of the invention and its practical application, to thereby enable others 
skilled in the art to best utilize the invention and various embodiments with various 
5 modifications as are suited to the particular use contemplated. It is intended that the 
scope of the invention be defined by the claims appended hereto and their 
equivalents. 



TRAN-068 



14 



6/25/03 



